Polynomial Regression
Polynomial Regression is an extension of linear regression that models the relationship between the independent variable (X) and the dependent variable (Y) as a polynomial. It is used when the relationship between (X) and (Y) is nonlinear and cannot be adequately captured by a straight line.
The general form of the polynomial regression equation is:
Formula:
Y = b₀ + b₁·X + b₂·X² + b₃·X³ + ... + bₙ·Xⁿ + ϵ
Explanation:
- Y: Dependent variable
- X: Independent variable
- b₀: Intercept (value of Y when all Xᵢ = 0)
- b₁, b₂, b₃, ..., bₙ: Coefficients representing the effect of each term of X on Y
- X², X³, ..., Xⁿ: Polynomial terms of the independent variable X
- ϵ: Error term (the difference between the actual and predicted Y)
When to Use Polynomial Regression?
- The relationship between the dependent and independent variables is nonlinear.
- Higher-order polynomials are needed to better fit the data.
Example: Modeling a Nonlinear Relationship
Suppose we have data showing the performance of a machine at different levels of temperature:
Temperature (°C) | Performance (%) |
---|---|
10 | 40 |
20 | 50 |
30 | 65 |
40 | 80 |
50 | 95 |
60 | 70 |
70 | 55 |
80 | 40 |
Clearly, the relationship is not linear. A polynomial regression can better model this behavior.
Code Example
Here’s how to perform polynomial regression and plot the results:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
# Data
X = np.array([10, 20, 30, 40, 50, 60, 70, 80]).reshape(-1, 1)
Y = np.array([40, 50, 65, 80, 95, 70, 55, 40])
# Transform data to include polynomial features (degree 2)
poly = PolynomialFeatures(degree=2)
X_poly = poly.fit_transform(X)
# Fit polynomial regression model
model = LinearRegression()
model.fit(X_poly, Y)
# Generate predictions
X_range = np.linspace(10, 80, 100).reshape(-1, 1)
X_range_poly = poly.transform(X_range)
Y_pred = model.predict(X_range_poly)
# Plot
plt.figure(figsize=(8, 6))
plt.scatter(X, Y, color='blue', label='Data points')
plt.plot(X_range, Y_pred, color='red', label='Polynomial Regression (degree 2)')
plt.title('Polynomial Regression Example')
plt.xlabel('Temperature (°C)')
plt.ylabel('Performance (%)')
plt.legend()
plt.grid(True)
plt.show()
Advantages
- Captures nonlinear relationships effectively.
- Provides more flexibility than linear regression.
Limitations
- Overfitting: High-degree polynomials may fit the training data well but fail to generalize.
- Computational Complexity: Higher degrees can make computations more resource-intensive.